MRRC: multiple role representation crossover interpretation for image captioning with R-CNN feature distribution composition (FDC)
نویسندگان
چکیده
While image captioning through machines requires structured learning and basis for interpretation, improvement multiple context understanding processing in a meaningful way. This research provides novel concept combination impacts many applications to deal with visual features as an equivalence of descriptions objects, activities, events. There are three components our architecture: Feature Distribution Composition (FDC) Layer Attention, Multiple Role Representation Crossover (MRRC) Attention Layer, the Language Decoder. FDC helps generating weighted attention from RCNN features, MRRC acts intermediate representation next word attention. A Decoder estimation likelihood probable sentence. We demonstrated effectiveness FDC, MRRC, regional object feature attention, reinforcement effective generate better captions images. The performance model enhanced previous performances 35.3% created new standard theory generation based on logic, interpretability, contexts.
منابع مشابه
Recurrent Highway Networks with Language CNN for Image Captioning
Language models based on recurrent neural networks have dominated recent image caption generation tasks. In this paper, we introduce a language CNN model which is suitable for statistical language modeling tasks and shows competitive performance in image captioning. In contrast to previous models which predict next word based on one previous word and hidden state, our language CNN is fed with a...
متن کاملDense Image Representation with Spatial Pyramid VLAD Coding of CNN for Locally Robust Captioning
The workflow of extracting features from images using convolutional neural networks (CNN) and generating captions with recurrent neural networks (RNN) has become a de-facto standard for image captioning task. However, since CNN features are originally designed for classification task, it is mostly concerned with the main conspicuous element of the image, and often fails to correctly convey info...
متن کاملJoint Learning of CNN and LSTM for Image Captioning
In this paper, we describe the details of our methods for the participation in the subtask of the ImageCLEF 2016 Scalable Image Annotation task: Natural Language Caption Generation. The model we used is the combination of a procedure of encoding and a procedure of decoding, which includes a Convolutional neural network(CNN) and a Long Short-Term Memory(LSTM) based Recurrent Neural Network. We f...
متن کاملA Distributed Representation Based Query Expansion Approach for Image Captioning
In Figure 1, we present more example results obtained with our approach on the benchmark datasets Flickr8K (Hodosh et al., 2013), Flickr30K (Young et al., 2014), MS COCO (Lin et al., 2014). We also provide groundtruth human descriptions for comparison. There are some cases where our approach falls short. In some of those cases, although the system does not produce the most desirable results, it...
متن کاملMulti-layered Image Representation for Image Interpretation
In order to bridge the semantic gap between the visual context of an image and semantic concepts people would use to interpret it, we propose a multi-layered image representation model considering different amounts of knowledge needed for the interpretation of the image at each layer. Interpretation results on different semantic layers of Corel images related to outdoor scenes are presented and...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Multimedia Tools and Applications
سال: 2021
ISSN: ['1380-7501', '1573-7721']
DOI: https://doi.org/10.1007/s11042-021-10578-9